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‘‘robust”  estimates.  In  this  paper  we  show  that,  for  a  class  of  functions 
Y,  using  these  robust  estimators  corresponds  to  assuming  that  data 
are  corrupted  by  Gaussian  noise  whose  variance  fluctuates  according 
to  some  given  probability  distribution,  that  uniquely  determines  the 
shape  of  Y. 

©  Massachusetts  Institute  of  Technology,  1991 

This  paper  describes  research  done  within  the  Center  for  Biological  Information  Process¬ 
ing,  in  the  Department  of  Brain  and  Cognitive  Sciences,  and  at  the  Artificial  Intelligence 
Laboratory.  This  research  is  sponsored  by  a  grant  from  the  Office  of  Naval  Research 
(ONR),  Cognitive  and  Neural  Sciences  Division;  by  the  Artificial  Intelligence  Center  of 
Hughes  Aircraft  Corporation  (Sl-801534-2).  Support  for  the  A.  I.  Laboratory’s  artificial 
intelligence  research  is  provided  by  the  Advanced  Research  Projects  Agency  of  the  Depart¬ 
ment  of  Defense  under  Army  contract  DACA76-85-C-0010,  and  in  part  by  ONR  contract 
N00014-85-K-0124. 


98  1  21  04  8 


1  Introduction 

A  common  problem  in  statistics  is  the  following:  given  n  noisy  observa¬ 
tions  gi  of  the  same  quantity  /,  give  an  estimate  of  /,  A  typic£il  solution 
to  this  problem  consists  in  choosing  the  value  of  /  that  maximes  the  like- 
lyhood  function  P{g\f),  that  is  the  probability  of  having  observed  the  data 
9  =  (i?i>  •  •  •  i9n)  if  the  true  value  was  /.  Estimates  of  this  type  are  named 
Maximum  Likelyhood  (ML)  estimates,  and  rely  on  the  assumption  that  we 
know  the  likelyhood  function  P(gi,. .  .,g„\f),  that  is  essentially  a  model  of 
how  noise  idfected  the  measure  process. 

A  common  assumption  is  that  of  additive  Gaussian  noise,  in  which  we 
assume  that  the  measurement  gi  are  related  to  the  true  value  by  the  relation 

9i  =  f  +  ti  ,  t  =  , 

where  Cj  are  independent  random  variables  with  given  gaussian  probability 
distributions  of  variance  a,-  and  zero  mean.  In  this  case  the  likelyhood 
function  is 


P(9l, .  ■ .  .S,l/)  =  n  =  n  (1) 

i=l  i=l  » 

where  )8i  =  Maximizing  the  likelyhood  function  (1)  corresponds  there- 
fore  to  solve  the  following  minimization  problem: 

nun  -/)*  .  (2) 

^  1=1 

An  elementary  computation  shows  that  the  solution  is  the  weighted  av¬ 
erage  of  the  data: 


/  = 


izr=i  Apt 


A 

The  ML  estimate  has  therefore  a  simple  meaning  and  it  is  easy  to  com¬ 
pute.  However,  it  is  well  known  that  estimates  of  this  type  are  not  “robust”, 
that  is  are  they  very  sensitive  to  the  presence  of  outliers  in  the  data.  In  order 
to  overcome  this  difficulty  it  has  been  proposed  to  use  a  modified  version  of 
the  minimization  problem  (2): 


N 


min  ^V{gi~f)  , 

■'  i=l 


(3) 
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Figure  1;  Different  choices  for  the  function  V.  (1)  V[x)  is  quadratic  for 
X  <  0.5  and  then  constant.  (2)  V{x)  =  |a:|.  (3)  V(x)  is  quadratic  for  x  <  0.5 
and  then  linear,  with  continuous  first  derivative. 


where  the  quadratic  function  (pi  —  /)*  has  been  substituted  by  some  other 
less  rapidly  increasing  even  function  V.  Estimator  of  this  typ>e  are  known 
in  statistics,  for  particular  choices  of  V,  as  robxist  estimators  (Huber,  1981). 
The  idea  underlying  (3)  is  that  if  the  error  (gi  —  fY  is  large,  it  is  likely  that 
is  an  outlier,  so  that  we  do  not  want  to  enforce  /  to  be  close  to  it.  Therefore 
the  function  V  should  not  increase  much  after  a  certain  value.  Different 
shapes  for  V  have  been  proposed,  and  some  of  them  have  been  depicted  in 
figure  (1). 

In  this  paper  we  want  to  give  a  more  rigorous  justification  for  the  use  of  es¬ 
timates  like  the  one  of  eq.  (3),  and  also  to  give  an  interpretation  of  the  model 
of  noise  to  which  they  correspond.  We  will  see  that  if  the  function 
is  completely  monotone,  then  using  eq.  (3)  corresponds  to  assuming  that 
our  measures  are  affected  by  a  Gaussian  noise  whose  variance  is  a  random 
variable  with  given  probability  distribution.  Depending  on  the  probability 
distribution  of  the  variance  of  the  noise,  different  shapes  for  V  are  obtained. 
For  a  particular  choice  of  V  a  justification  of  such  a  technique  was  given  in 
(Girosi,  Poggio  and  Caprile,  1991),  but  no  characterization  was  given.  In  the 
next  section  we  formalize  these  statements,  while  in  the  following  sections 
we  present  a  large  class  of  functions  V  that  can  be  used,  together  with  some 
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examples. 


2  Robust  Maximum  Likelyhood  Estimates 

In  order  to  simplify  the  notation  we  consider  the  problem  presented  in  the 
previous  section  in  which  only  one  measurement  g  is  done,  since  this  does 
not  change  the  main  conclusions.  We  therefore  assume  that 

9  =  f  +  ^  (4) 

where  e  is  a  random  variable  whose  distribution  is  Gaussian  with  zero  mean 
and  variance  <t.  The  likelyhood  function  is  therefore 

f  («!/)  =  (5) 

where  3  =  \. 

* 

When  we  compute  the  standard  maximum  likelyhood  estimate  we  are  as¬ 
suming  that  the  variance  of  the  noise  has  a  fixed  value,  but  this  assumption 
is  not  always  realistic.  In  fact,  in  many  cases  the  accuracy  of  the  measure¬ 
ment  apparatus  can  fluctuate,  due  to  some  external  causes,  and  in  these 
cases  our  data  can  contain  outliers.  A  more  re2distic  assumption  consists  in 
considering  the  variance  of  the  Gaussian  noise,  and  therefore  /d,  as  a  random 
variable,  with  given  distribution  P(/?).  We  are  therefore  led  to  introduce  the 
probability  P{g\f,  P)  of  having  observed  the  data  g  if  the  true  value  weis  / 
amd  the  variance  of  the  noise  was  a  —  -V: 

VP 

P{s\f,^)  =  .  (6) 

Notice  that  the  right  hand  s'ide  of  eq.  (6)  is  the  same  of  eq.  (5),  but  their 

meaning  is  different.  We  can  now  compute  the  joint  probability  P{g^0\f) 

of  having  observed  the  data  g  in  presence  of  gaussian  noise  with  variance 

<T  =  if  the  true  value  was  /: 
y/P 

Pig,m  =  P{9\f,^)Pm-  (7) 

Since  we  are  not  interested  in  estimating  but  we  are  interested  only 
in  the  probability  of  g  given  /,  that  is  our  likelyhood  function,  we  integrate 
equation  (7)  over  $  to  obtain  the  effective  noise  distribution 
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P-W)  =  P{3\m  PWdff  (8) 

The  MAP  estimate  is  now  obtained  by  maximizing  the  probability  of  eq. 
(8),  or,  taking  the  negative  of  its  logarithm,  solving  the  following  minimiza¬ 
tion  problem: 


min  V{9-f)  (9) 

where  we  have  defined  the  so  called  effective  potential 

V{x)  =  -\nj"e-^^y/^  P{p)d^  .  (10) 

In  the  case  in  which  n  observations  yi,  • . .  ,5n  have  been  taken  the  same 
considerations  apply,  and  assuming  that  the  variances  a,-  of  the  measurements 
Qi  have  all  the  same  probability  distribution,  we  obtain,  instead  of  eq,  (9): 

rniri  f)  .  (11) 

^  isl 

This  equation  coincides  with  eq.  (3),  that  has  been  proposed  has  a  tech¬ 
nique  to  “robustize”  the  least-square  estimate  (2).  In  our  case,  however, 
the  effective  potential  V  derives  from  specific  assumptions  on  how  data  axe 
corrupted  by  noise.  If  the  distribution  of  the  random  variable  0  is  a.  delta 
function  centered  on  some  value  $,  that  is  if  P{^)  =  the  noise  model 

is  Gaussian  with  fixed  variance,  2uid  the  effective  potential  is  a  quadratic  func¬ 
tion,  yelding  the  same  result  of  eq.  (2).  For  other  probability  distributions 
P{0),  formula  (10)  allows  to  compute  the  corresponding  effective  potential 
by  simply  performing  a  one  dimensional  integration.  Conversely,  in  some 
cases,  given  an  effective  potential  V(x),  it  is  also  possible  to  understand  if 
there  is  any  probability  distribution  P{0)  that  corresponds  to  it.  In  the  next 
section  we  introduce  a  class  of  effective  noise  distributions  for  which  such  a 
characterization  caxi  be  given. 

3  A  class  of  effective  noise  distributions 

In  this  section  we  study  and  characterize  a  class  of  effective  noise  distri¬ 
butions.  Since  we  want  to  maximize  the  effective  noise  distribution  (8)  we 

^This  name  was  previosly  introduced  by  Geiger  and  Girosi  (1991),  that  used  a  similar 
technique  applied  at  the  problem  of  surface  reconstruction  with  discontinuities. 
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are  not  interested  in  effective  distributions  that  are  unbounded.  It  will  turn 
out  that  if  an  effective  noise  distribution  is  bounded  at  the  origin  it  is  also 
bounded  on  all  the  real  axis.  Therefore,  according  to  eq.  (8)  we  define  the 
hounded  effective  noise  distributions  as  the  probability  distributions  of  the 
form: 


/(l)  =  (12) 

where  P(fi)  is  a  probability  distribution,  and  such  that  the  following  condi¬ 
tion  is  satisfied: 


/(O)  =  jf”  <  +CO  . 

We  can  now  prove  the  following  proposition: 

Proposition  3.1  A  probability  distribution  f{x)  is  a  bounded  effective  noise 
distributions  if  and  only  if  f{y/^  is  completely  monotone. 

Proof:  (only  if)  Suppose  /(x)  is  a  bounded  effective  noise  distribution. 
Then  f{y/x)  it  can  be  represented  as 

/(V^)  =  r 

Jo 

where 

=  I  v/fP(T)dT  . 

^0 

Since  p{^)  is  clearly  non  decreasing  and  bounded,  then  by  the  Bern¬ 
stein’s  theorem  on  the  representation  of  completely  monotone  functions  (see 
Appendix  A),  f{y/x)  is  completely  monotone. 

(if)  Suppose  that  the  probability  distribution  /(x)  is  such  that  f{y/x)  is 
completely  monotone.  Then  it  can  be  represented  as 

f(x)=  Te-d-^dm,  (13) 

Jo 

with  /i(/9)  non  decreasing  and  bounded.  Since  /  is  a  probability  distribution 
its  integral  over  the  real  axis  has  unit  value,  and  therefore 
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Exchanging  the  order  of  integration  and  evaluating  the  gaussian  integral, 
we  obtain  that 


Therefore  it  is  always  possible  to  write 

diiW  =  POT  ^  d0 , 

where  P{P)  is  a  probability  distribution,  being  positive  and  having  finite 
integral.  Substituting  this  expression  in  formula  (13)  we  obtain  the  rep¬ 
resentation  of  eq.  (12).  Noticing  that  completely  monotone  functions  are 
bounded  at  the  origin,  since 

/(o)=  rd/i(^)<-foo, 

Jo 

we  conclude  that  /(x)  is  a  bounded  effective  noise  distribution. □ 

We  can  now  answer  to  the  question  if  effective  potentials  of  the  type  V{x)  = 
Ixj**  can  be  derived  in  this  framework.  In  fact,  using  the  previous  proposition 
it  is  sufficient  to  check  if  the  probability  distribution  P{x)  =  is  such 
that  P{y/x)  is  completely  monotone.  Using  the  fact  that  the  function 
is  completely  monotone  if  and  only  if  0  <  p  <  1  (Schoenberg,  1937)(see 
appendix  A),  we  can  immediately  derive  the  following  proposition: 

Proposition  3.2  The  function  V{x)  =  \xY  is  the  effective  potential  asso¬ 
ciated  to  a  bounded  effective  noise  distribution  if  and  only  if  0  <  p  <  2. 


We  notice  that  if  we  set  p  =  1  in  the  proposition  above  we  obtain  as 
effective  potential  the  usual  Li  error  measure,  that  is  V'(x)  =  |a:|,  is  obtained. 
However,  since  the  function  absolute  value  is  not  differentiable  at  the  origin 
it  has  been  proposed  to  use  functions  that  behave  quadratically  in  a  neighbor 
of  the  origin,  and  linearly  for  large  values  of  the  argument  (Eubank,  1988). 
Effective  potentials  of  the  form  V'(x)  =  |xp  are  interesting,  since  they  are 
convex  and  the  problem  of  maximizing  the  likelyhood  function  hais  therefore 
only  one  solution.  However,  before  showing  what  are  the  effective  noise 
distributions  that  are  associated  to  this  effective  potentials,  we  present  a 
more  simple  example,  that  gives  a  non  convex  effective  potential  that  has 
also  been  used  in  practice. 
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4  A  class  of  non  convex  effective  potentials 

We  have  already  seen  that  if  the  distribution  P{0)  is  a  delta  function  the 
standard  quadratic  potential  is  obtained.  The  simplest  non  trivial  case  con¬ 
sists  in  assuming  that  P{fi)  is  a  sum  of  two  delta  functions,  that  is 

P(/3)  =  (1  -  -  y9,)  (14) 

where  e  is  a  parameter  between  0  and  1  and  /Sj ,  ^  are  fixed  positive  numbers, 
fix  >  $2-  If  /S2  is  a  very  small  number  such  a  distribution  can  represent  the  a 
priori  knowledge  that  a  fraction  e  of  the  data  is  very  unreliable.  In  the  limit 
of  ^  going  to  zero  this  fraction  of  data  is  consituted  by  genuine  outliers,  and 
we  therefore  analyze  the  model  keeping  in  mind  that  we  are  interested  in  this 
limit. 

With  the  noise  distribution  given  by  eq.  (14)  the  effective  potential  be¬ 
comes 


V(i)  =  -  In  j(“  -  e)S{ff  -  ft)  +  €6{ff  -  I3,)W  (15) 

and,  after  some  algebra: 

K(x)  =  ,  (16) 

where  we  have  neglected  unimportant  constant  terms. 

We  start  studying  the  behavior  of  the  potential  in  a  neighbor  of  the  origin. 
Taking  a  Taylor’s  expansion  up  to  the  second  order,  after  some  algebra  we 
find  that 


V{X)  =  V(0)  -I-  o(x=') 

so  that  the  potential  is  initially  quadratic,  and  very  flat  if  ^  is  small,  that 
is  if  we  assume  that  outliers  are  present  in  the  data. 

When  X  goes  to  infinity  the  exponential  term  in  the  logarithm  of  eq.  (16) 
grows  very  fast,  (remember  that  ^  >  02)-,  and  the  unit  term  can  be  omitted, 
leading  to  major  simplifications.  This  is  true  only  if  02  is  ‘'not  to  small” ,  in 
the  sense  that  the  following  inequality  has  to  verified: 

x"^  »  k(e,0i) -^\vi02  (17) 

where  k{t,0i)  is  a  costant  that  depends  only  c  and  /3i,  whose  exact  form  is 
irrelevant  to  us.  In  the  region  where  this  condition  is  satisfied  we  therefore 
obtain: 
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bata2B0.01  -  bet>2s0.06  -  beta2=0.10 


Figure  2:  The  non  convex  effective  potential  of  the  model  above  for  different 
values  of 


V{x)  ss  /9ix*  -\nk{e,l3i) 

and  therefore: 


For  large  values  of  x  and  small  values  of  /92>  where  “large”  and  “small” 
have  to  be  intended  in  the  sense  of  condition  (17),  the  effective  potential  is 
again  quadratic  and  very  flat. 

In  summary:  for  small  values  of  x  the  potential  is  a  very  flat  parabola,  for 
large  values  of  x  is  the  same  parabola,  but  translated  of  a  positive  amount 
that  grows  logarithmically  with  ^2?  between,  since  its  first  derivative 

is  strictly  positive,  it  smoothly  connects  these  two  behaviours.  In  fig.  (2)  we 
show  the  shape  of  the  effective  potential  for  fixed  c  and  ,  for  three  differ¬ 
ent  values  of  We  set  c  =  0.1,  =  4,  and  ^2  €  {0.1,  0.05,  0.01}.  This 

amounts  to  say  that  we  know  a  priori  that  90%  of  the  data  points  are  affected 
by  Gaussian  noise  of  variance  equal  to  0.5  (that  is  The  other  10%  is  af¬ 
fected  by  Gaussian  noise  with  very  large  variance,  that  \s  a  =  3.16,  4.47,  10. 
We  notice  that  for  a  value  of  02  =  0.01,  that  corresponds  to  a  variance 
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<r  =  10,  the  effective  potential  is  extremely  flat,  almost  constant.  A  similar 
behavior  is  expected:  in  fact  it  means  that  when  the  interpolation  error  is 
larger  than  a  threshold  its  influence  on  the  solution  is  not  taken  in  account 
anymore,  and  this  is  exactly  the  kind  of  motivation  that  led  statisticians  to 
consider  robust  models. 

5  A  class  of  convex  effective  potentials 

We  now  consider  an  effective  potential  of  the  form 

V(i)  =  Vot^  +  .  (18) 

where  a  is  some  given  parameter,  possibly  zero.  Functions  of  this  type  are 
well  known  in  approximation  theory  by  the  name  of  “multiquadrics” ,  or 
“Hardy’s  multiquadrics”,  and  their  behavior  is  shown  in  fig.  (3).  Potentials 
of  this  shape  are  interesting  because  they  are  convex,  so  that  the  minimiza¬ 
tion  problem  associated  with  them  has  a  unique  solution.  Moreover,  poten¬ 
tials  with  a  shape  very  similar  to  this  one  can  be  implemented  in  analog 
VLSI  circuits  (Harris,  1990),  allowing  very  fast  ways  to  solve  the  estimation 
problems. 

We  are  interested  in  finding  the  probability  distribution  that  leads  to  this 
form  of  effective  potential.  A  solution  to  this  problem  certainly  exists,  since 
it  is  easy  to  show  that  is  completely  monotonic.  Therefore  we  have 

to  find  a  function  P(/3)  such  that 

This  is  in  essence  the  problem  of  computing  an  inverse  Laplace  transform. 
We  start  frorr,  the  following  identity  (Gradshteyn  and  Ryzhik,  1981): 

2V7.  ^  =  r  d0  (19) 

Jo 

and  perform  the  substitution  i  — ♦  -f  q^,  obtaining 

2v/5Fe-'/*^=  r  60  .  (20) 

Jo 

Making  the  proper  identifications  in  equation  above,  and  paying  attention 
to  normalization  factors,  we  obtain  as  a  result: 

W)  =  •  (21) 
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Figure  3:  The  multiquadric  effective  potential  V(z)  =  (a^  +  for  three 
different  values  of  the  parameter  a. 

We  can  now  derive  the  distribution  P{<t)  of  the  variance  a  —  impos¬ 
ing 

PWd0  =  ^  i  .  (22) 

<7 

After  some  algebra  we  obtain  the  function 

Pi<T)  =  . 

whose  shape  is  depicted  in  fig,  (4)  for  three  different  values  of  a.  We  no¬ 
tice  that  when  q  increases  the  distribution  becomes  more  peaked,  and  also 
flatter  around  the  origin.  Therefore  the  probability  of  having  low-noise  data 
decreases  when  a  increases.  Equivalently,  we  can  also  say  that  the  proba¬ 
bility  of  having  data  with  noise  larger  than  a  given  threshold  increases  with 
a. 

6  Conclusions 

We  have  shown  that  it  is  possible  to  give  a  simple  interpretation  to  estimators 
based  on  the  solution  of  the  minimization  problem 


i  n - 1 - 1 - 1 - 1 - 1 

•  iC»» 


■  m  O.t*'***-^  Alyk*  •  O.ft  •  1.0 


Figure  4:  The  distribution  of  variance  P{a)  associated  to  the  multiquadric 
effective  potential. 


(23) 

^  i=l 

where  V  is  an  appropriate  function,  that  we  call  effective  potential.  If  the 
function  is  completely  monotone,  using  these  robust  estimators  cor¬ 

responds  to  compute  Maximum  Likelyhood  estimators  under  the  assumption 
that  data  are  corrupted  by  Gaussian  noise  whose  vau-iance  fluctuates  accord¬ 
ing  to  a  given  probability  distribution,  that  uniquely  determines  V.  Typical 
“effective  potentials”  V,  that  have  been  used  in  the  past,  belongs  to  the  class 
we  consider. 

We  notice  that  the  result  we  derived  holds  also  in  the  more  general  settings 
context  of  parametric  and  non  parametric  regression.  In  order  to  see  why, 
let  ^  =  {(x„y,)  €  /Z”  X  be  a  set  of  data  that  has  been  obtained 

by  randomly  sampling  a  multivariate  function  /  in  presence  of  noise.  In 
parametric  regression  we  assume  that  /  is  a  parametric  function  h{x;  p), 
where  p  €  if”,  and  the  optimal  set  of  parameters  p  is  usujilly  recovered  by 
minimizing  the  least  square  error 

P))"  •  (24) 

p€/v" 

As  in  the  case  considered  in  this  paper,  this  can  be  tbougth  as  a  maximum 
likelyhood  estimator,  under  the  assumption  of  Gaussian  noise  with  fixed 
variance.  Therefore  the  same  argument  we  applied  in  section  (2)  applies 
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here,  and  more  robust  estimates  could  be  obtained  if  we  replace  the  quadratic 
function  in  eq.  (24)  with  an  effective  potential  V. 

In  non  parametric  regression  no  assumption  is  made  on  the  specific  form 
of  /,  and  a  common  technique  consists  in  solving  the  following  minimization 
problem: 

mjn  S(/(x.)  -  yif  +  A5[/]  . 

^  i=i 

where  S[f]  is  an  appropriate  convex  functional  of  /  and  A  a  positive  number. 
This  correspond  to  compute  the  Maximum  A  Posteriori  estimator,  under 
the  eissumption  of  Gaussian  noise  and  a  priori  probability  for  /  given  by 
P{f)  oc  If  we  assume  that  the  variance  of  the  Gausssian  noise  is  a 

random  variable,  using  the  same  argument  we  used  in  section  (2)  we  can  prove 
that  the  Maximum  A  Posteriori  estimator  solves  the  following  minimization 
problem: 


min  I;V(/(x.)-k)  +  A5|/].  (25) 

^  »=1 

where  V  is  an  effective  potential.  Estimators  of  this  type  are  known  in  the 
statistical  literature  as  M-type  smoothing  splines  (Eubank,  1988),  and  their 
implementation  in  analog  VLSI  circuits  has  been  considered  by  J.  Harris 
(1991)  for  some  choices  of  the  functional  S[f]. 

A  Completely  Monotone  Functions 

We  need  to  give  the  following: 

Definition  A.l  A  function  f.is  said  to  be  completely  monotonic  on  (0,  oo) 
provided  that  it  is  C°°(0,  oo)  and  (— l)'|^(x)  >  0,  Vi  €  (0,oo),  V/  6  Af, 
where  Af  is  the  set  of  natural  numbers. 

A  typical  example  of  completely  monotone  function  is  the  exponential 
function  /(x)  =  c~“®,  with  a  >  0.  It  turns  out  that  all  the  completely  mono¬ 
tone  functions  are  linear  superpositions  with  positive  coeflBcients  of  scaled 
exponentials,  as  the  following  theorem  of  Bernstein  shows: 

Theorem  A.l  (Bernstein,  1929)  The  class  of  completely  monotone  func¬ 
tions  is  identical  with  the  class  of  functions  of  the  form 
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g(x)  =  f  dfi{0), 

Jo 

where  is  non-decreasing  and  bounded  for  /3  >  0. 
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